A Multiprocessor Architecture Combining Fine-Grained and Coarse-Grained Parallelism Strategies
نویسنده
چکیده
A wide variety of computer architectures have been proposed that attempt to exploit parallelism at different granularities. For example, pipelined processors and multiple instruction issue processors exploit the fine-grained parallelism available at the machine instruction level, while shared memory multiprocessors exploit the coarse-grained parallelism available at the loop level. Using a registertransfer level simulation methodology, this paper examines the performance of a multiprocessor architecture that combines both coarse-grained and fine-grained parallelism strategies to minimize the execution time of a single application program. These simulations indicate that the best system performance is obtained by using a mix of fine-grained and coarse-grained parallelism in which any number of processors can be used, but each processor should be pipelined to a degree of 2 to 4, or each should be capable of issuing from 2 to 4 instructions per cycle. These results suggest that current high-performance microprocessors, which typically can have 2 to 4 instructions simultaneously executing, may provide excellent components with which to construct a multiprocessor system.
منابع مشابه
Extended Parallelism Models For Optimization On Massively Parallel Computers
1. Abstract Single-level parallel optimization approaches, those in which either the simulation code executes in parallel or the optimization algorithm invokes multiple simultaneous single-processor analyses, have been investigated previously and have been shown to be effective in reducing the time required to compute optimal solutions. However, these approaches have clear performance limitatio...
متن کاملExploiting Coarse-Grain Parallelism in the MPEG-2 Algorithm
As the demand for multimedia applications increases, the performance of algorithms such as MPEG-2 video compression on general-purpose microprocessors is becoming more important. In this paper, we propose a number of coarse-grained parallel implementations of MPEG-2 decoding and encoding. We evaluate the performance of these implementations on a single-chip multiprocessor, and compare the perfo...
متن کاملSynchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization
The quest to improve performance forces designers to explore finer-grained multiprocessor machines. Ever increasing chip densities based on CMOS improvements fuel research in highly parallel chip multiprocessors with 100s of processing elements. With such increasing levels of parallelism, synchronization is set to become a major performance bottleneck and efficient support for synchronization a...
متن کاملRationale, Design and Performance of the Hydra Multiprocessor
In Hydra four high performance processors communicate via a shared secondary cache. The shared cache is implemented using multichip module (MCM) packaging technology. The Hydra multiprocessor is designed to efficiently support automatically parallelized programs that have high degrees of fine grained sharing. This paper motivates the Hydra multiprocessor design by reviewing current trends in ar...
متن کاملMultilevel Parallelism for Optimization on Mp Computers: Theory and Experiment
Parallel optimization approaches which exploit only a single type of parallelism (e.g., a single simulation instance executes in parallel or an optimization algorithm manages concurrent serial analyses) have clear performance limitations that prevent effective scaling with the thousands of processors available in massively parallel (MP) supercomputers. This motivated the development of a two-le...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Computing
دوره 20 شماره
صفحات -
تاریخ انتشار 1994